Behaviour-Conditioned Policies for Cooperative Reinforcement Learning Tasks

نویسندگان

چکیده

The cooperation among AI systems, and between systems humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner types. This requires the assess behaviour of during a cooperative task adjust its own policy support cooperation. Deep reinforcement learning models can be trained deliver required functionality but are known suffer from sample inefficiency slow learning. However, adapting ongoing ability type quickly. We suggest method, where we synthetically produce populations agents different behavioural patterns together ground truth data their behaviour, use this for training meta-learner. additionally architecture, which efficiently generated gain meta-learning capability. When equipped such meta-learner, it capable quickly types in new situations. method used automatically form distribution meta-training emerging behaviours that arise, example, through self-play.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Constructing Stochastic Mixture Policies for Episodic Multiobjective Reinforcement Learning Tasks

Multiobjective reinforcement learning algorithms extend reinforcement learning techniques to problems with multiple conflicting objectives. This paper discusses the advantages gained from applying stochastic policies to multiobjective tasks and examines a particular form of stochastic policy known as a mixture policy. Two methods are proposed for deriving mixture policies for episodic multiobje...

متن کامل

Reinforcement Learning of Cooperative Persuasive Dialogue Policies using Framing

In this paper, we apply reinforcement learning for automatically learning cooperative persuasive dialogue system policies using framing, the use of emotionally charged statements common in persuasive dialogue between humans. In order to apply reinforcement learning, we describe a method to construct user simulators and reward functions specifically tailored to persuasive dialogue based on a cor...

متن کامل

Imitative Policies for Reinforcement Learning

We discuss a reinforcement learning framework where learners observe experts interacting with the environment. Our approach is to construct from these observations exploratory policies which favor selection of actions the expert has taken. This imitation strategy can be applied at any stage of learning, and requires neither that information regarding reinforcement be conveyed from the expert to...

متن کامل

Cooperative Inverse Reinforcement Learning

For an autonomous system to be helpful to humans and to pose no unwarranted risks, it needs to align its values with those of the humans in its environment in such a way that its actions contribute to the maximization of value for the humans. We propose a formal definition of the value alignment problem as cooperative inverse reinforcement learning (CIRL). A CIRL problem is a cooperative, parti...

متن کامل

Behaviour-Based Reinforcement Learning

Although behaviour-based robotics has been successfully used to develop autonomous mobile robots up to a certain point, further progress may require the integration of a learning model into the behaviour-based framework. Reinforcement learning is a natural candidate for this because it seems well suited to the problems faced by autonomous agents. However, previous attempts to use reinforcement ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Lecture Notes in Computer Science

سال: 2021

ISSN: ['1611-3349', '0302-9743']

DOI: https://doi.org/10.1007/978-3-030-86380-7_40